AITopics | thousand word

Collaborating Authors

thousand word

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

An Image is Worth More Than a Thousand Words: Towards Disentanglement in The Wild

Neural Information Processing SystemsDec-24-2025, 02:24:14 GMT

Unsupervised disentanglement has been shown to be theoretically impossible without inductive biases on the models and the data. As an alternative approach, recent methods rely on limited supervision to disentangle the factors of variation and allow their identifiability. While annotating the true generative factors is only required for a limited number of observations, we argue that it is infeasible to enumerate all the factors of variation that describe a real-world image distribution. To this end, we propose a method for disentangling a set of factors which are only partially labeled, as well as separating the complementary set of residual factors that are never explicitly specified. Our success in this challenging setting, demonstrated on synthetic benchmarks, gives rise to leveraging off-the-shelf image descriptors to partially annotate a subset of attributes in real image domains (e.g. of human faces) with minimal manual effort. Specifically, we use a recent language-image embedding model (CLIP) to annotate a set of attributes of interest in a zero-shot manner and demonstrate state-of-the-art disentangled image manipulation results.

disentanglement, name change, thousand word, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.60)
Information Technology > Artificial Intelligence (0.40)

Add feedback

Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models

Neural Information Processing SystemsMay-27-2025, 08:02:04 GMT

Large language models (LLMs) and vision-language models (VLMs) have demonstrated remarkable performance across a wide range of tasks and domains. Despite this promise, spatial understanding and reasoning--a fundamental component of human cognition--remains under-explored. We propose SpatialEval, a novel benchmark that covers diverse aspects of spatial reasoning such as relationship understanding, navigation, and counting. We conduct a comprehensive evaluation of competitive language and vision-language models. Our findings reveal several counter-intuitive insights that have been overlooked in the literature: (1) Spatial reasoning poses significant challenges where competitive models can fall behind random guessing; (2) Despite additional visual input, VLMs often under-perform compared to their LLM counterparts; (3) When both textual and visual information is available, multi-modal language models become less reliant on visual information if sufficient textual clues are provided.

picture worth, spatial reasoning, vision language model, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.89)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)

Add feedback

Not Every Image is Worth a Thousand Words: Quantifying Originality in Stable Diffusion

Haviv, Adi, Sarfaty, Shahar, Hacohen, Uri, Elkin-Koren, Niva, Livni, Roi, Bermano, Amit H

arXiv.org Artificial IntelligenceAug-15-2024

We begin by evaluating T2I models' ability to innovate and generalize through controlled experiments, revealing that stable diffusion models can effectively recreate unseen elements with sufficiently diverse training data. Then, our key insight is that concepts and combinations of image elements the model is familiar with, and saw more during training, are more concisly represented in the model's latent space. We hence propose a method that leverages textual inversion to measure the originality of an image based on the number of tokens required for its reconstruction by the Figure 1: Illustration of our approach for measuring image model. Our approach is inspired by legal definitions originality using multi-token textual inversion. Original images of originality and aims to assess whether a require more tokens for accurate reconstruction, while model can produce original content without relying common images like Van Gogh's "Starry Night" need only on specific prompts or having the training data one token.

experiment, originality, quantifying originality, (15 more...)

arXiv.org Artificial Intelligence

2408.08184

Country:

Europe > Austria > Vienna (0.14)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.05)
North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Saudi Arabia > Northern Borders Province > Arar (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.68)

Industry: Law > Intellectual Property & Technology Law (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation

Segalis, Eyal, Valevski, Dani, Lumen, Danny, Matias, Yossi, Leviathan, Yaniv

arXiv.org Artificial IntelligenceOct-25-2023

Text-to-image diffusion models achieved a remarkable leap in capabilities over the last few years, enabling high-quality and diverse synthesis of images from a textual prompt. However, even the most advanced models often struggle to precisely follow all of the directions in their prompts. The vast majority of these models are trained on datasets consisting of (image, caption) pairs where the images often come from the web, and the captions are their HTML alternate text. A notable example is the LAION dataset, used by Stable Diffusion and other models. In this work we observe that these captions are often of low quality, and argue that this significantly affects the model's capability to understand nuanced semantics in the textual prompts. We show that by relabeling the corpus with a specialized automatic captioning model and training a text-to-image model on the recaptioned dataset, the model benefits substantially across the board. First, in overall image quality: e.g. FID 14.84 vs. the baseline of 17.87, and 64.3% improvement in faithful image generation according to human evaluation. Second, in semantic alignment, e.g. semantic object accuracy 84.34 vs. 78.90, counting alignment errors 1.32 vs. 1.44 and positional alignment 62.42 vs. 57.60. We analyze various ways to relabel the corpus and provide evidence that this technique, which we call RECAP, both reduces the train-inference discrepancy and provides the model with more information per example, increasing sample efficiency and allowing the model to better understand the relations between captions and images.

image generation, thousand word

arXiv.org Artificial Intelligence

2310.16656

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)

Add feedback

A Picture is Worth a Thousand Words: This Microsoft Model can Generate Images from Short Texts

#artificialintelligenceSep-27-2020, 22:46:08 GMT

I recently started a new newsletter focus on AI education. TheSequence is a no-BS( meaning no hype, no news etc) AI-focused newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers and concepts. Humans build knowledge in images. Every time we are presented with an idea or an experience, our brain immediately formulates visual representations of it.

artificial intelligence, discriminator, machine learning, (17 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.75)

Add feedback

Your Photo Of A Burrito Is Now Worth A Thousand Words

#artificialintelligenceJul-19-2016, 01:55:43 GMT

That burrito in your hands--so warm, so gooey, the richness cut by cilantro and red-hot spice. Before you take a bite, you'd better take a picture. Multiply that impulse by tens of thousands and you get Yelp's database of images, drawn from burrito joints, cocktail bars, and more. Until recently, Yelp was dependent on users to tag their images with search-friendly metadata. But now, using the kind of deep learning techniques that are transforming the field of AI, Yelp is starting to see the business benefits of using software intelligence to power its listing pages and user recommendations.

artificial intelligence, machine learning, stoppelman, (10 more...)

#artificialintelligence

Country: North America > United States > California (0.06)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback